Our system for annotation of articles is named “Text Detective”
نویسنده
چکیده
Text Detective is then able to tag every word in the sentence according to biological relevant categories. For instance, chemical compounds are recognized and labelled. The identification of “central words” (also known as “core terms”) is a key step in this process (words such as “receptor”, “kinase”, “transporter”, etc). For this purpose, we have built a lexicon and used some carefully curated rules. Also “types” are tagged (words such as “alpha”, “a1”, “c”, “12”, “TNF”), since they may define the exact identity of the gene (distinguishing between “interferon alpha” and “interferon gamma”, for instance). These “type” words are recognized after a set of carefully designed rules (presence of capital letters, numbers, Greek letters, etc.). Notice that gene symbols (such as “TNFalpha”) are also tagged as “type”.
منابع مشابه
News Image Annotation on a Large Parallel Text-image Corpus
In this paper, we present a multimodal parallel text-image corpus, and propose an image annotation method that exploits the textual information associated with images. Our corpus contains news articles composed of a text, images and image captions, and is significantly larger than the other news corpora proposed in image annotation papers (27,041 articles and 42,568 captionned images). In our e...
متن کاملtagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles
The breadth and depth of biomedical literature are increasing year upon year. To keep abreast of these increases, FlyBase, a database for Drosophila genomic and genetic information, is constantly exploring new ways to mine the published literature to increase the efficiency and accuracy of manual curation and to automate some aspects, such as triaging and entity extraction. Toward this end, we ...
متن کاملLearning multilingual named entity recognition from Wikipedia
We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify and classify names of people, locations and organisations in text. This dependence on expensive annotation is the knowledge bottleneck our work ...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملAutomatic Semantic Web Annotation of Named Entities
This paper describes a method to perform automated semantic annotation of named entities contained in large corpora. The semantic annotation is made in the context of the Semantic Web. The method is based on an algorithm that compares the set of words that appear before and after the name entity with the content of Wikipedia articles, and identifies the more relevant one by means of a similarit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004